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Green building (GB) is a design concept that implements sustainable 
processes and green technologies in the building's life cycle. However, the 
design process of GB tends to take longer than conventional buildings due to 
the integration of various green requirements and performances into the 
building design. Advanced artificial intelligence (AI) methods such as 


machine learning (ML) are widely used to help designers do their jobs faster 


and more accurately. Therefore, this study aims to develop a GB design 
Keywords: predictive model utilizing ML techniques that consider four GB design 
criteria: energy efficiency, indoor environmental quality, water efficiency, 
and site planning. A dataset of GB projects collected from a private 
construction company based in Jakarta was used to train and test the ML 
model. The accuracy of the models was evaluated using mean square error 
(MSE). The comparison of MSE values of the conducted experiments 
showed that the combination of the artificial neural network (ANN) method 
with the IF-ELSE algorithm created the most accurate ML model for GB 
design prediction with an MSE of 1.3. 
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1. INTRODUCTION 

Building and construction industry is known to imply negative impacts on the environment 
regarding excessive consumption of natural resources [1]. Furthermore, the building sector contributes more 
than 40% of greenhouse gas emissions and consumes not less than 40% of the global energy production [2]. 
Practitioners, professionals, and academics from the building and construction industry have attempted to 
find alternative approaches to practice energy conservation in the building life cycle. One of the efforts is 
implementing the green building (GB) concept [3]. GB concept refers to environmentally friendly and 
sustainable principles implemented in buildings’ life cycle from the early stage of project planning, operation 
and maintenance to the decommissioning phase. It has been widely perceived as a strategy to minimize 
energy usage in the building and construction sector [4], [5]. GB concept applies principles and technologies 
to buildings throughout their life cycle to obtain sustainable purposes, such as minimizing the negative 
impacts on the environment caused by buildings and the human activities inside [6], [7]. 

Decisions made at the initial building design stage can significantly affect the environment [8]. 
However, due to various design aspects and building performances that must be set to achieve sustainability 
optimally, the design of GB tends to be more complex than conventional buildings [9]. Consequently, the 
design process of the GB can take longer due to the need for a multidisciplinary teamwork project where the 
team members should elaborate each GB aspect into the design [10], [11]. 
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Technology advances that enable digitization, automation, and integration in the project life cycle 
have helped construction transform into a technology-driven industry by generating integrated systems and 
simplifying complex mechanisms that make the decision-making process more efficient [12], [13]. 
Furthermore, technology implementation has been proven to increase productivity in GB projects [14]. 
Machine learning (ML) is a technique that equips a system with the ability to learn and improve through its 
own experiences without being programmed [15]. It has been extensively researched and applied in the 
building life cycle [16], [17]. In the building design stage, this approach has been developed to optimize the 
building performance of GB design. 

Previous studies conducted in these past few years regarding ML utilization in the building design 
process have given substantial contributions to the development of digital technology adoption in the 
building design process. It has revolutionized how the entire design process is performed [18], [19]. For 
example, a study [20] used the artificial neural networks (ANN) method to develop an ML model to predict 
reliable energy performance in office buildings that requires computation time that is 50 times faster than the 
standard building performance simulation tools. On the other hand, Statistical Neural Network & Gaussian 
Regression algorithms employed to develop an ML model to make fuel consumption predictions in a 
commercial building by Rahman and Smith [21] were proven to have better accuracy in doing so. 

Furthermore, Geyer and Singaravel [22] developed a component-based ML model using the ANN 
method to predict thermal energy performance in office buildings. The computation time required to generate 
the prediction is drastically reduced with a small result of less than 3.9% error. It is in line with another study 
that compared the ANN and regression method for indoor air thermal condition prediction in residential 
buildings. The study results showed that even though the ML model with ANN takes time and needs much 
data to develop, it has a higher accuracy value of prediction results [23]. A framework to predict building 
performance at the design stage based on the interaction between buildings and humans developed using 
ANN algorithms was also proven to have an improved estimate [24]. 

These previous studies showed that the proposed predictive models using ML methods could 
significantly reduce the computation time required in the design process, increasing the productivity of 
architects and engineers designing GB. Despite the various development, however, there is still minimal 
evidence found on the usage of the ML approach in developing a prediction model for the design of the GB. 
This study attempts to create a design prediction model for GB using the ANN method as one of the ML 
techniques to address this gap. This paper is expected to provide references and give insights to building 
practitioners regarding the utilization of the ML approach in increasing the time efficiency of the GB design 
process, which can make a significant contribution toward the acceleration of technology-based development 
in the building and construction sector. 


2. METHOD 

This study was done in two stages to develop a predictive ML model for the design of GB as shown 
in Figure 1. The first stage is defining the GB design variables in the form of GB criteria and indicators used 
as parameters for the input and output of the ML model. These variables were obtained by performing a 
literature study of relevant research on GB published in the last five years, such as [22], [25]-[34], as well as 
GB assessment tools [35], [36], and regulations [37]. 


Literature GB Design GB Data GB Data Data Pre- 
Study on GB Variables Collection Synthetization processing 


ANN 
— =< 
Algorithm Modelling Selection 


Figure 1. Research workflow 


The experiments for the ML model development were performed in the second stage using the 
design variables and parameters obtained as the features for the ML model. Before the predetermined design 
variables were inputted into the experiments, a preprocessing step was performed to prepare the data. 
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Furthermore, the upper and lower limits for the value of each variable were determined [38], in which the 
values were based on the GB regulations applied both in Indonesia and other countries, as well as the archive 
analysis carried out on various documents discussing the design criteria of GB. The existing data on the GB 
projects, the ANN method, and IF-ELSE statements used to develop the ML model will be explained in the 
following sections. 


2.1. Green building data collection 

The historical data of GB design parameters used in this study were initially collected from a 
construction company based in Jakarta, Indonesia. However, due to the Non-Disclosure Agreement (NDA) 
between the contractor and owners of the GB projects, actual data cannot be fully provided. Subsequently, 
additional synthetic data were engineered to complete the data of the GB historical projects. Synthetic data is 
artificial data generated with the purpose of maintaining privacy for data sharing, which was used as training 
and testing data for ML model development [39], [40]. This data acquisition method has been used in ML 
development if the required data are not publicly accessed [41]. Synthetic data can be generated by adding 
actual or entirely synthesized data [42]. The synthetic data for ML model training and testing should be 
representative of the original dataset and based on existing standards [40]. 

The synthetic data was built based on the ranges of parameter values obtained from the Green 
Building Council Indonesia (GBC), Jakarta Governor Regulation No. 38 of 2012 on Green Buildings, and 
the Jakarta Green Building User Guide issued by the Jakarta provincial government. Furthermore, Building 
Research Establishment Environmental Assessment Method (BREEAM) was also used as a reference for 
meeting data requirements [36]. 


2.2. Data analysis 

Content analysis is the data analysis technique used to determine the variables and provide 
conclusions obtained from the literature study. It is a solid analytical technique for qualitative data with the 
systematic process used to conclude data in order of the text [43]. Due to various relevant studies’ diverse 
views and perceptions, the analysis results were presented in tabular form. The table would be interpreted in 
four columns: criteria, variables, indicators, and references. 

The missing data from the collected building data for ML training and testing were then completed 
by creating synthetic data using estimated ranges of values derived from the applied GB standards and 
regulations. The Microsoft Excel spreadsheet functions used in the data preprocessing step are random 
functions shown (1), 


= RANDBETWEEN (lower limit, upper limit) (1) 


The random function was used to process data in the form of a minimum and maximum standards. 
The function then generated an integer random number from the two constraints that have been defined. 
Meanwhile, if the standard is a decimal number and then use the function, 


= RAND() * (upper limit — lower limit) + lower limit (2) 


2.3. ML algorithms 

The ML model developed was begun by importing the dataset completed in the previous step into 
the Python 3.7 programming language. The major specifications of the development environment are, 2.5 
GHz Intel Core i5 CPU, an Intel HD Graphics 4000 with 1.5GB (1536 MB) of VRAM integrated GPU, and 4 
GB RAM running. The programming code was compiled using the Google Collaboratory, a cloud service 
based on Jupyter Notebooks that disseminates machine learning research [44]. The packages used in this 
development were NumPy, pandas, Matplotlib, Scikit-learn, Tensorflow, and Keras. Furthermore, the sklearn 
preprocessing package that has the ability to transform raw datasets into a suitable representation was also 
used to perform data standardization quickly and straightforwardly. 

The ML algorithms used in this experiment are the ANN and IF-ELSE algorithms. ANN is an 
artificial adaptive system inspired by human brain processes [45], the essential elements include node points 
known as processing elements (PE) and their relationships. Each node point has its input from 
communication between points or the environment and its output. Each of these vertices has a function that 
converts its general input into output. The nodes interact through the connections to generate the prediction. 
Since the GB design prediction include multiple inputs dan outputs; therefore, ANN as an algorithm that can 
provide predictions that resemble the learning processes of complex problems was selected. Furthermore, it 
has a high degree of flexibility in representing data regression [22]. 

Each relationship is characterized by the strength of the pair of nodes which gives a positive or 
negative value. A positive value means triggering, while a negative value means inhibiting [46]. The 
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relationships between nodes can modify themselves, so this dynamic begins a learning process in the entire 
ANN, which is a key mechanism that characterizes ANNs [47]. All Pes in the ANN are interconnected with 
connection weights which are the basis of ANN’s learning capabilities. 

ANN can execute the data experimental knowledge in the training process and provide accurate 
predictions [38]. It consists of three main layers: the input, hidden, and output. The hyperparameters, which 
include the number of hidden layers, the number of nodes in the hidden layer, and the activation function, can 
be adjusted to the model’s requirements at the time of model development to improve the quality of learning 
and provide an optimal model [48]. Since the architecture of the ANN network can be different for each ML 
model, the model selected is the one with the lowest deviation rate. The advantages of ANNs are their 
representational capabilities and universal function estimation capabilities, which are offered by feedforward 
neural networks [49]. The function of hidden neurons is to intervene in the external input and output of the 
network and allow the network to extract statistics at a higher level [50]. 

On the other hand, the IF-ELSE algorithm is usually used in making a decision among conditions or 
statements. It has several blocks that have different state conditions in each block. If the IF condition is true, 
then the true block of statements in the IF structure will be executed. However, when the IF condition is 
false, the false block of statements in the ELSE will be executed [51]. 

Among other metrics, mean squared error (MSE) is used to evaluate the performances of the ML 
model developed in each experiment due to its theoretical relevance in statistical modeling and sensitivity to 
outliers [52]. A model’s MSE is the mean of the squared predictions error over all occurrences in the test set, 
in which prediction error shows the difference between the actual value and the predicted value [53]. MSE 
compresses all the training data and model predictions into a particular value measuring how well an ML 
model imitates reality. 


MSE =-7-4(¥, — ¥,)? (3) 


Where, 

n = number of items 

>; = summation notation 
Y; = actual 

Y, = prediction 


3. RESULTS AND DISCUSSION 
3.1. Green building criteria and indicators 

The GB criteria used in the ML model development were obtained from the literature study. There 
are several leading design factors frequently discussed in GB guidelines and scientific publications, including 
indoor environmental quality, energy, water, material, waste, site planning, and innovation [33], [54]. 
However, to achieve the objective of this study, four particular design criteria that can be quantified were 
selected as the features for the ML development. The sub-criteria and indicators for the GB design criteria 
were also determined as shown in Table 1. 


Table 1. GB indicators for predictive ML development 


Criteria Sub-criteria Indicators References 
Energy Building Geometry Building Area [22], [25] 
Building Orientation [22], [25], 
Number of Floors [22], [25], [26] 
Fenestration Window wall ratio (WWR) [22], [25], [26] 
Glazing Type [27] 
Indoor Environmental Quality Visual Comfort Indoor [luminance [27], [35] 
Thermal Comfort Air Temperature (27]-[29], [35] 
Relative Humidity (27]-[29], [35] 
Acoustic Comfort Sound Level [35], [29], [28] 
Water Water Usage Washbasin [30], [35 
Toilet Flush [30], [35 
Urinals [30], [35 
Water Recycling Rainwater Harvesting [31]}[35] 
Site Planning Site Planning Landscape Area [35] 
Cyclist Facilities [31], [34], [35] 
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3.2. Parameters of green building design criteria 

There are two sub-criteria in the energy efficiency criteria: building geometry and fenestration, with 
five indicators with determined parameters. The parameters for the indicators in building geometry that 
include building area and the number of floors were obtained from the historical data of the constructed GB. 
The GB projects varied from office, school, apartment, mall, and industrial building functions. The 
parameters of the building orientation are south-north and east-west. 

The fulfillment of WWR standards of 20%—27% [55], in which the WWR calculation for overall 
thermal transfer value (OTTV) should not exceed 45 watts per square meter, as required in Governor 
Regulation No.38 of 2012 Article 6. Subsequently, the OTTV is 35.06-43.82 W/m?. Furthermore, the 
thermal performance of the glazing type is shown by the U value, the measurement of heat loss (or heat flow) 
per square meter of surface area per 1-degree (Kelvin) temperature difference. The U value for the glazing 
type refers to the Jakarta Green Building User Guide document, stating that Indonesia's locally available U 
values are 4.94 W/m’, 4.55 W/m’, and 5.18 W/m’. 

In the indoor environmental quality criteria, the air temperature and relative humidity indicators 
were based on the GBCI and Governor Regulation No. 38 of 2012 Article 8 regarding the benchmark for 
thermal comfort that sets the air temperature plan at 25°C and relative humidity at 60%. As for indoor 
illuminance, the lighting levels for different building functions were based on the GBCI, Governor 
regulation, and referring SNI-03-6197-2011 concerning Energy Conservation in Lighting Systems. 
Furthermore, the sound level was based on the GBCI regulation, which refers to SNI-03-6386-2000 
concerning Specifications for Sound Levels and Reverberation Time in Buildings and Housing. 

Parameters for sinks, toilet flushes, and urinals in the water efficiency criteria were based on 
standards by GBCI and Jakarta Green Building User Guide, providing a maximum value of water capacity of 
8L/min for the sink, 4.5L/flush for the toilet flush, and 1.5L/flush for the urinal. In contrast, the minimum 
value is based on BREEAM UK: 3 liter/minute for the sink, 3 liter/flush for the toilet flush, and 0 liter /flush 
for the urinal. According to Governor Regulation No.38 of 2012 article 22, the volume of rainwater storage 
must be provided 5% of the ground floor area (GFA). 

Based on Jakarta Governor Regulation No.38 of 2012 Article 21, the landscaping area in the 
building is 15% of the GFA for 5-story buildings, 30% of GFA for 9-story buildings, and 45% of the GFA 
for buildings higher than that. Moreover, referring to Article 25, bicycle parking facilities are at least one 
bicycle rack for every multiple of 2,500 square meters of building area. Table 2 summarizes the standards 
required for the GB indicators. 


Table 2. GB variables and parameter standards 


Variables Indicators Min Max 
Energy Building Area Historical Building Project Data 
Building Orientation South-North & East-West 
Number of Floors Historical Building Project Data 
Window Wall Ratio(WWR) 20% 27% 
Glazing Type 4.54 W/m?K 5.18 W/m’?K 
Indoor Environmental —_ Indoor Illuminance Office & School:350 lux 
Quality Apartment: 150 Lux 
Mall: 500 lux 
Industry: 500 Lux 
Air Temperature 25 °C 
Relative Humidity 60% 
Sound Level Office: 40 dB 


School: 35 dB 
Apartments: 45 dB 


Mall: 45 dB 
Industry: 50 dB 
Water Washbasin 3 L/min 8 L/min 
Toilet Flush 3 L/flush 4.5 L/flush 
Urinals 0 L/flush 1.5 L/flush 
Rainwater Harvesting 0.05 x ground floor area 
Site Planning Landscape Area 15% of the ground floor area 45% of the ground floor area 
Cyclist Facilities One bicycle rack/2,500 m2 building area 


3.3. Machine learning process 

A total of 62 constructed GB was used as the training and testing datasets, with a ratio of 50 and 12 
datasets, respectively. The built datasets were then inputted into the Google Collaboratory and processed by 
the ANN algorithm as shown in Figure 2. The text data were then converted into numeric data so that these 
data could be read and processed. The feature extraction process converted the data of the building function 
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to numeric data, where (0) for the apartment, (1) for industry, (2) for the mall, (3) for school, and (4) for the 
office. While building orientation data was converted to (0) for east-west and (1) for south-north. 


[2] df = pd.read excel( 
df.head() 


*VYohRe oi 
© :s: 
total 


project building bui no. of no. of building ; ; 
a uildini : temperature  humidit I ile 
name function SFA basement Floor orientation "** type lighting Ps Y noise = wash toile 


area basin flus 


glazing indoor 


office 1181.82 13000 2.0 9.0 east-west 4.54 350 
schoo! 21957.1 153700 0.0 7.0 east - west 4.94 350 
school 1847 60185 4.0 18.0 south-north 5.18 


Office 2314.27 50914 4.0 18.0 east-west 4.54 


60 
60 
60 
60 
60 


Office 1978.62 47487 E 21.0 east-west 


school 4 south-north 
apartment 2250 . u south-north 
office 108.333 1 south-north 


office 1953.71 south-north 


888 8 8 


office 1863.08 . Mi south-north 


62 rows x 19 columns 


Figure 2. Data input process 


Experiments were conducted four times during the model development using the ANN algorithm to 
find the best predictive model. All indicators became the output layer incorporated in the first attempt, 
including WWR, glazing type, temperature, relative humidity, indoor illuminance, noise levels, washbasins, 
toilet flush, urinals, rainwater harvesting landscaping areas, and cyclist facilities. The result is that the MSE is 
rather significant, at 5,119,586. In the second attempt, the MSE decreased to 443,484 because another hidden 
layer was added, making the model more accurate in predicting data and reducing error. Moreover, in the 
second experiment, data were divided into two smaller batch sizes of 31. 

In the third attempt, the model tested only for one indicator, WWR, which showed a lower MSE of 
128.9. It occurred due to the accumulated MSE value only coming from one indicator, while in the previous 
trial, the MSE value accumulated for all 12 indicators. Lastly, in the fourth attempt with the prediction 
outputs of washbasin, toilet flush, and urinal, the model has resulted in a low error rate compared to the 
previous trial process. For this reason, the model in the fourth experiment was considered the best model 
used in predicting the data because it has an MSE of 1.3. Therefore, the ANN algorithm was used for 
building function, ground floor area, building area, number of basements, number of floors, building 
orientation, toilet flush, urinals, and washbasin indicators as shown in Figure 3. The other indicators not 
included in the ANN algorithm were developed using the IF/ELSE algorithm, including WWR, type of 
glazing, indoor lighting level, temperature, relative humidity, noise level, rainwater harvesting, landscaping 
area, and cyclist facilities as shown in Figure 4. The trial-and-error processes of ML model development are 
summarized in Table 3. 


Table 3. ML experiments 


Layer Experiment 1 Experiment 2 Experiment 3 Experiment 4 
Input Layer Function, ground floor area, building area, number of basements, number of floors, building orientation 
Output Layer All output indicators All output indicators WWR Urinals, Toilet Flush, Washbasins 
Hidden Layer 1 7 7 7 
Epoch 50 50 50 50 
Batch Size - 31 31 31 
Activation Function Relu Relu Relu Relu 
MSE 5,119,586 443,484 128.9 13 


The developed ML model can be used to generate a prediction for building design parameters with 
the building geometry data as the input. The building geometry is, a 9-story office building with 2-floor 
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basements, GFA of 1,180 m7, building area of 13,000 m2, and east-west orientation. The results generated by 
the predictive model are 22% WWR, 4.94 W/m°K for glazing type, 350 lux for indoor lighting, 25 °C for 
temperature, 60% for humidity, 40 dB for noise, 5.6 L/flush for washbasin, 3.9 L/flush for toilet flush, 0.9 
L/flush for urinals, 650 m? for rainwater harvesting, and 354 m? for landscape area. 


building . Pa - \. a( iia \ 
function . 4p J i _* 
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ground I t: k—\ 
floor area | INV —y toilet 
SS flush 
building 7 y V1 \4 
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Figure 3. The Structure of the ANN algorithm 
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Figure 4. The structure of the IF/ELSE algorithm 


4. CONCLUSION 

The ML model for GB design was developed by considering the availability of existing data; the 
result is that the four GB design factors used in the ML model are energy efficiency, indoor environmental 
quality, water efficiency, and site planning. Moreover, the predictive model using IF-ELSE and ANN 
algorithms with an MSE of 1.3 was the most accurate. However, since this study has some limitations 
regarding the number of GB collected data, this study encourages future studies to develop a more robust ML 
model with improved accuracy performance by collecting more GB data. Furthermore, further research is 
needed to create the ML-based GB application design tool, easing designers’ tasks during the conceptual 
phase of GB design. 
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